Optimal Quantum Circuits for General Two-Qubit Gates 
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In order to demonstrate non-trivial quantum computations experimentally, such as the synthesis of arbitrary 
entangled states, it will be useful to understand how to decompose a desired quantum computation into the 
shortest possible sequence of one-qubit and two-qubit gates. We contribute to this effort by providing a method 
to construct an optimal quantum circuit for a general two-qubit gate that requires at most 3 CNOT gates and 
15 elementary one-qubit gates. Moreover, if the desired two-qubit gate corresponds to a purely real unitary 
transformation, we provide a construction that requires at most 2 CNOTs and 12 one-qubit gates. We then prove 
that these constructions are optimal with respect to the family of CNOT, y-rotation, z -rotation, and phase gates. 

PACS numbers: 03.67.Lx, 03.65.Fd, 03.65.Ud 



I. INTRODUCTION 

It is known that any n-qubit quantum computation can be 
achieved using a sequence of one-qubit and two-qubit quan- 
tum logic gates yj, |2[. However, even for two-qubit gates, 
finding the optimal circuit with respect to a particular fam- 
ily of gates is not easy 0]. This is unfortunate because, at 
the current time, quantum computer experimentalists can only 
achieve a handful of gate operations within the coherence time 
of their physical systems [4]. Without a procedure for optimal 
quantum circuit design, experimentalists might be unable to 
demonstrate certain quantum computational milestones even 
though they ought to be within reach. For example, a current 
experimental goal is the synthesis of any two-qubit entangled 
state 1 5]. Although it is known, in principle, how to synthe- 
size any such state J6J], the resulting quantum circuits can be 
suboptimal, requiring excessive numbers of CNOT gates, if 
done injudiciously Q. The current solution to this problem 
uses rewrite rules to recognize and eliminate redundant gates. 
However, a better solution would be to perform optimal de- 
sign from the outset. 

In this paper we give a procedure for constructing an opti- 
mal quantum circuit for achieving a general two-qubit quan- 
tum computation, up to a global phase, which requires at most 
3 CNOT gates and 15 elementary one-qubit gates from the 
family {R y ,R z }. We prove that this construction is optimal, 
in the sense that there is no smaller circuit, using the same 
family of gates, that achieves this operation. In addition, we 
show that if the unitary matrix corresponding to our desired 
gate is purely real, it can be achieved using at most 2 CNOT 
gates and 12 one-qubit gates. 

A flurry of recent results on gate-count minimization for 
general two-qubit gates, report similar findings to us. Vidal 
and Dawson proved that 3 CNOTs are sufficient to implement 
a general U E SU(4) and that two-qubit controlled-!/ op- 
erations require at most 2 CNOTs |8]. Vatan and Williams 
proved that any U E SU(4) requires at most 3 CNOTs, and 
16 elementary one-qubit {R y ,R z } gates, that any U E SO (4) 



(i.e., real gate) requires at most 2 CNOTs and 12 one-qubit 
{R y , R z } gates, and that these constructions are optimal |9]. 
Later, Shende, Markov, and Bullock reported similar results 
on circuit complexity for U € SU(4), and specialized the 
complexity bounds depending on which families of one-qubit 
gates were being used 1 10]. Fundamentally, all these results 
rest upon the decomposition of a general U E SU(4) given in 
II llll 211 and used in the GQC quantum circuit compiler IL3II . 

The remainder of the paper is organized as follows. After 
introducing some notation in Section||lJ we discuss the magic 
basis |11] in Section [HI] and prove (in Theorems \l\ and |2ji 
its most important property, namely, that real entangling two- 
qubit operations become non-entangling in the magic basis. 
We also prove (via the circuit shown in FIG. [2 first introduced 
in 1 9]) that the magic basis transformations require at most 
one CNOT to implement them explicitly. This is in contrast 
to Fig. 3 in 1 15], which required three CNOTs. It turns out 
that this compact quantum circuit for the magic basis trans- 
formation is the cornerstone of our subsequent constructions 
for generic two-qubit gates, and our proofs of their optimality. 
In Section ffVl we present the first such construction, which 
proves that any two-qubit gate in SO(4) can be implemented 
in 12 elementary (i.e., R y , R z ) gates and 2 CNOTs. Theo- 
rem|4]extends this results to any two-qubit gate in 0(4) with 
determinant equal to — 1, and proves that any such gate re- 
quires 12 elementary gates and 3 CNOTs. In SectionlVl these 
results are generalized to the generic two-qubit gates in U(4), 
and we provide an explicit construction that requires 15 ele- 
mentary gates and 3 CNOTs. Finally, in Section lVTl we prove 
that our construction for generic two-qubit gates is optimal by 
showing that there is at least one gate in U(4), namely the 
two-qubit SWAP gate, which cannot be implemented in fewer 
than 3 CNOTs. 
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Throughout this paper we identify a quantum gate with the 
unitary matrix that defines its operation. We take rotations 
about the y and z-axes, respectively R y (0) and R z (a), as our 
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elementary one-qubit gates; i.e., 



Ry(9) = 



e l 2 
e" l f 



However, we also have three special one-qubit gates: the one- 
qubit identity matrix H 2 , and the Hadamard gate H and the 
phase gate S defined as 
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We define two CNOT gates, CNOT1 a standard CNOT gate 
with the control on the top qubit and the target on the bottom 
qubit, and CNOT2 with the control and target qubits flipped. 
Thus 



CNOT1 
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CNOT2 = 
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We also use the two-qubit gate SWAP gate, which is defined as 



SWAP = CNOT1 • CNOT2 ■ CNOT1 



/l 0\ 
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We use the notation the /\i(V) for the controlled- ^ gate, 
where V £ U(2). Throughout this paper we assume that for 
the Ai (V) gate the control qubit is the first (top) qubit. There- 
fore, 



Ax(F) 



V 



In the special case of the /\i((J z ) gate, we use the notation 
CZ. For any unitary matrix U, we denote its inverse, i.e., the 
conjugate-transpose of U, by U* . 



s 
s 








H 











FIG. 1: A circuit for implementing the magic gate M. 



Proof. Proof. We prove the theorem by showing that for every 
A®B G SU(2)<S>SU(2), we have .M* (A®B)M G SO(4). 
It is well-known that every matrix A £ SU(2) can be writ- 
ten as the product R z (a) R y {9) R z (j3), for some a, (3, and 9. 
Therefore any matrix A B G SU(2) ® SU(2) can be written 
as a product of the matrices of the form V <X> II2 and U.2 (8 V, 
where V is either R y (9) or R z (a) . Thus the proof is complete 
ifM* (V® 1 2 ) M and AT (1 2 ® V) M, are in SO(4). El- 
ementary algebra shows that this the case. 

Since the mapping A ® B h-> M* (A ® B) M is one-to- 
one and the spaces SU(2) ® SU(2) and SO(4) have the same 
topological dimension, we conclude that this mapping is an 
isomorphism between these two spaces. □ □ 



Note that the above theorem is not true for all orthogonal 
matrices in 0(4). In fact, for every matrix U G 0(4), either 
det(U) = 1 for which the above theorem holds, or det(C7) — 
— 1 for which we have the following theorem. 

Theorem 2. For every U G 0(4) with det([/) = -1, the 
matrix MU M* is a tensor product of 2- dimensional unitary 
matrices and one SWAP gate in the form of the following de- 
composition: M ■ U ■ M* = (A® B) ■ SWAP ■ (fl 2 ® a z ), 
where A,B £ U(2). 

Proof. First note that det(CNOTl) = -1 and 
det(f7 ■ CNOT1) = 1. Then M (CNOTl) M* = 
(S* ® 5*)SWAP(]1 2 a z ). Since MUM* = 
(M (U ■ CNOTl) M*)- (M (CNOTl) M*), the theorem 
follows from Theorem^ □ 



III. MAGIC BASIS 



There are different ways to define the magic basis fl2l fl4l 
Hrjll . Here we use the definition used in Jl4l[lal : 



M = —j= 
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The circuit of FIG.[0implements this transformation. 

The following theorem presents the basic property of the 
magic basis. This result is already known (see, e.g., 11911 1 and 
we provide a proof for the sake of completeness. 

Theorem 1. For every real orthogonal matrix U G SO(4), 
the matrix of U in the magic basis, i.e., M ■ U ■ M* is ten- 
sor product of two 2-dimensional special unitary matrices. In 
other words: M ■ U ■ M* G SU(2) <g> SU(2). 



IV. REALIZING TWO-QUBIT GATES FROM O (4) 

Let U G SO (4). Then Theorem[T] shows that MUM* = 
A® B, where A,B £ SU(2). Therefore, U = M* (A ® 
B) M. We use the circuit of FIG-Hfor computing the magic 
basis transform M to obtain a circuit for computing the uni- 
tary operation U. This circuit can be simplified by using the 
decompositions S = e 47r/4 i? 2 (7r/2) and H = <j z R y (-K/2). 
Note that H2 <8> a z and the CNOT2 gates commute, and the 
overall phases e lTr / 4 and e -47r / 4 from S and S* cancel out. 
Hence we obtain the circuit of FIG.|2]for computing a general 
two-qubit gate from SO(4). Thus we have proved the follow- 
ing theorem. 

Theorem 3. Every two-qubit quantum gate in SO (4) can be 

realized by a circuit consisting of 12 elementary one-qubit 
gates and 2 CNOT gates. 
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FIG. 2: A circuit for implementing a general transform in SO(4), 
where A, B € SU(2), Si = R z (tt/2) and Ri = R y (w/2). 



A similar argument and Theorem |2] imply the following 
construction for gates from 0(4) with determinant equal to 
-1. 

Theorem 4. Every two-qubit quantum gate in 0(4) determi- 
nant equal to — 1 can be realized by a circuit consisting of 12 
elementary gates and 2 CNOT gates and one SWAP gate ( see 
FIG.0. 
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FIG. 3: A circuit for implementing a transform in 0(4) determinant 
equal to -1, where A, B G SU(2), Si = R z ( § ) and R 1 = -R„(f). 

Next, we generalize these results to construct circuits for gates 
in U(4). 



V. REALIZING TWO-QUBIT GATES FROM U(4) 

In is known that every U S U(4) can be written as 

U=(A 1 ®A 2 )-N{a,l3,~/)-(A 3 ®A 4 ), (1) 
where Aj 6 U(2) and 

N{a,P,i) = [exp(i(aa x <E>a x +l3ay<g)ay + ja z <g)a z ))], 

for a,/3,7 £ K (see, e.g., O d Hi)- Note that if 
U £ SU(4), then we can choose all operations A, in Q from 
SU(2). Our construction is based on constructing an optimal 
circuit for computing N(a, (3, 7). To this end, we first note 
that D = A4 * ■ N ■ M. is a diagonal matrix of the form 

diag ^ e ^ Q_/3+7 - ) e - ^" - ' 3-7 - 1 e i< - Q+/3 ~ 7 - 1 e~^ a+ ^ +1 ^j 

Therefore, N(a,f3,j) = M ■ D ■ M*. Utilizing the circuit 
of FIG. \l\ for A4, we get the circuit of FIG. |4] for computing 
N(a, (3, 7). Note that (S ® S) ■ D ■ (S* <g> S*) = D. Then 
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we substitute the right-hand side Hadamard gate of FIG. |4]by 
3 gates, using the following identity: 1 2 ® H = CNOT1 • 
(I2 8> H) ■ CZ. Now, the matrix D\ = CZ • D is a diagonal 
matrix, and 

(11 2 ®H)-D X - {% ®H) = K X {V 2 ) • (1 2 (8 Vi), (2) 
where 

e* 7 cos(a — (3) i e %1 sin(a — /3) N 



Vi 



z e 17 sin(a — /3) e 17 cos(a — /3) 

ie- 2i7 sin2/3 e~ 2i7 cos2/3 
e 



1/2 1 2l7 cos2/3 ie~ 2l7 sin2/3 



We have the following decompositions for V\ and Ai(Va) (see 
also Q): 



Vi = e 47 i? z (-§) • i? y (2(/3 - a)) ■ fl,(§), (3) 



and 



Ai(F 2 ) = e l( f- 7) (]l 2 ® «,(—§)) • CNOT1 

■ (l a ®i2 B (2)9-f)) CNOT1 (4) 
•(^(27-1)^(^(1-2/3)^,(1))). 

By utilizing the equations (0-0, we can convert the circuit 
of FIGHto the circuit of FIG. 
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FIG. 4: A circuit for implementing N(a, /3, 7); first version. 



FIG. 5: A circuit for implementing N(a, /3, 7); second version. Here 

Si = R x (% ), S 2 = 7? z (2 7 - f ), Ti = J^(f - 2a), and T 2 = 
^(2/3 -f). 

Now we focus on the sequence CNOT1 ■ (H 2 ® -R*(-§ )) • 
CNOT1 of operations. We have the following identity 

CNOT1 • (1 2 ® i? 2 (0)) • CNOT1 = 

CNOT2 • (R z (6) ]1 2 ) • CNOT2. 

After applying this rule, the two consecutive CNOT2 gates on 
the right-hand side of the circuit reduce to the identity. Also 
note that, on the left-hand side of the circuit, we can apply the 
rule 

(H a <8i2,(0)) • CNOT1 = CNOT1 • (]1 2 <g> R z {6)). 

Thus the circuit of FIG. |3 can be converted to the circuit of 
FIG. [6] Note that the operation defined by this circuit has de- 
terminant equal to — 1, thus we need to add a global e l % phase 
to get the special unitary operation N(a, (3, 7) exactly. Now 
utilizing the circuit of FIG.|6]and the canonical decomposition 
0, we could get a circuit to realize the operation U £ U(4). 
Note that in this process, the left and right-hand side opera- 
tions i? z (§ ) and # z (-§) of FIG. El will be "absorbed" by 
adjacent Aj, The final result is the circuit of FIG.0 an d we 
have proved the following theorem. 
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FIG. 6: A circuit for implementing N(a, j3, 7); third version. A 
global e l t phase is missing here. 



Theorem 5. Every two-qubit quantum gate in U(4) can be 
realized, up to a global phase, by a circuit consisting of 15 
elementary one-qubit gates and 3 CNOT gates. 
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FIG. 7: A circuit for implementing a transform in U(4). 

The construction given in Theorem[5]is optimal. To prove 
this it is sufficient to place a lower bound on the number of 
CNOT gates needed to implement a generic two-qubit gate. 
This is because 1 15] already shows that we need at least 15 
elementary one-qubit gates, to implement a generic two-qubit 
gate. So we need only concern ourselves with the minimum 
required number of CNOT gates. We prove in the next section 
that three CNOT gates are needed in the general case. 

We wish to emphasize that our decomposition is construc- 
tive. To see this, note that we can use Kraus and Cirac's 
methods 1 12] to decompose any desired two-qubit gate into 
the form given by equation Q. All parameters in this de- 
composition may be determined constructively. Thereafter, it 
only remains to reduce the N(a, (3, 7) matrix to an explicit 
quantum circuit. This we can do immediately using the cir- 
cuit template in FIG.|6] By concatenating these two processes 
we can find the optimal circuit for any generic two-qubit op- 
eration constructively 



VI. THREE CNOT GATES ARE NEEDED 

To show that the construction of Theorem|5]is optimal, we 
prove that there is at least one gate in U(4), namely the two- 
qubit SWAP gate, a real unitary matrix having a determinant of 
— 1, which requires no less than 3 CNOT gates. 

In the proof of the following theorem we utilize the notion 
of entangling power introduced in 1 17]. For a unitary opera- 
tion U E U(4), the entangling power of U is defined as 



EP(U) = average 

|V>1>«#2> 



[£(Envi>®hf2>)], 



where average is over all product states \ipi) £g) \1jj2) G C 2 <8>C 2 
distributed according to the uniform distribution (in general, 
we can define EP with regards to any distribution, but here 
we only consider the uniform distribution). In the above for- 
mula E is the linear entropy entanglement measure defined 



for \tp) 6 C 4 as follows: 

=l-tr lP 2 , 

where p = tr 2 (ip\ and tij denotes the result of tracing 
out the j th qubit. Note that < E( \ij>) ) < |, and the lower 
or upper bound is obtained if is a product state or a maxi- 
mally entangled state, respectively. In 11711 the following sim- 
ple formula for calculating EP is presented: 

EP(U) = f-£ [(U® 2 , Ti j3 U® 2 Ti, 3 ) + 

((SWAP • U)® 2 , T h3 (SWAP • U)® 2 T h3 )] , 

where the Hilbert-Schmidt scalar product {A, B) is defined as 
(A, B) = tr(A t S) and the permutation T lj3 on C 2 <£> C 2 ® 
C 2 (£> C 2 is the transposition T13 \a, b, c, d) = |c, b, a, d) on 
the system of 4 qubits. 

We will utilize the following basic properties of the func- 
tion EP. 



• For every U e U(4) we have < EP(U) < §. 

• For every A,B £ U(2) we have EP(A ® B) = 0. 

• For every U S U(4) and A, B E U(2) we have EP((A<g> 
B)-U)= EP(U ■ (A ® B)) = EP(U). 

• EP(f7) = EP(U*). 

• EP(CNOT) = § and EP(SWAP) = 0. 

We will also use the simple fact that SWAP cannot be written 
as SWAP = A ® B, where A, B e U(2). 

Theorem 6. To compute the SWAP at least 3 CNOT gates are 
needed. 

Proof. We construct a proof by contradiction. Suppose that 
there is a circuit computing SWAP and consists of less than 
three CNOT gates. We consider two possible cases. 

Case 1. Suppose that SWAP is computed by a circuit consisting 
of two CNOT gates. We substitute each CNOT gate by a small 
subcircuit in terms of CZ (controlled-er z ) gate; i.e., 

CNOT = (1 2 O H) ■ CZ • (1 2 O H) . 

Then by utilizing the following commutation rules 

CZ • (l a <g> R z {t)) = (1 2 ® R z {t)) ■ CZ, 
CZ • (R z (t) ® 1 2 ) = (R z (t) ® 1 2 ) • CZ, 

we obtain the simplified circuit of FIG. |8] for computing the 
SWAP gate. Note that in this figure we choose the top (first) 
qubit as the control qubit for the CZ gates, but we could 
choose the other qubit as the control qubit as well, since the 
action of the CZ gate is not change by switching the control 
and target qubits. Now, let 

U = CZ • (R y (a) <g> Ry{b)) ■ CZ. 
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FIG. 8: A circuit consisting of two CNOT gates in terms of CZ gates. 



Then 

EP(U) = EP(SWAP) 
= jg (3 - cos(2a) - cos(26) - cos(2a) cos(26)) = 0. 

Therefore, a,b S {0,7r}. Thus we have the following four 
possible cases for the unitary operation U: 

• ifa = 6 = 0, thenf/= %; 

• if a = 0, b = ir, then U — u z ® R y (Tr); 

• if a = 7T, 6 = 0, then C/ = R y (Tt) ® cr z ; 

• if a = & = 7r, then U = —da; <8> cr x . 

In each case, we conclude that SWAP = V\ ® V2, for some 
Vi, V2 G U(2), which is a contradiction. 

Case 2. Suppose that SWAP is computed by a circuit consisting 
of only one CNOT gat; for example 

SWAP = (At <g> A 2 ) ■ CNOT1 • (A 3 ® A 4 ), 

where A 3 e U(2). Then EP(SWAP) = EP(CNOT), which 
again is a contradiction. □ 

VII. CONCLUSION 

In this paper we prove tight bounds on the numbers 
of one-qubit gates and CNOT gates needed to implement 
generic two-qubit quantum computations. In addition, we 
give a constructive procedure for finding such decomposi- 
tions, which uses the Kraus-Cirac decomposition to find the 
core entangling operation underlying the two-qubit gate, i.e., 
N(a,(3, 7), and then substitutes the discovered parameter 
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